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ABSTRACT 



A server system for processing client requests received over 
a communication network includes a cluster of N document 
servers and at least one redirection server. The redirectio n 
s erver receiv es a cl ient request from the network and redi- 
rects it to one ot tne document servers, based on a set o f 
pre-computed redire ction probabilities. Each of the doc u- 
ment serve rs may be an HTTP server that manages a set'o f 
flocuments locally and can service client requests only lor 
the locally-available documents . A set of documents are 
distributed across the document servers in accordance with 
a load distribution algorithm which may utilize the access 
rates of the documents as a metric for distributing the 
documents across the servers and determining the redirec- 
tion probabilities. The load distribution algorithm attempts 
to equalize the sum of the access rates of alt the documents 
stored at a given document server across all of the document 
servers. In the event of a server failure, the redirection 
probabilities may be recomputed such that the load of client 
requests is approximately balanced among the remaining 
document servers. The redirection probabilities may also be 
recomputed periodically in order to take into account 
changes in document access rates and changes in server 
capacity. The recomputation may be based on a maximum- 
flow minimum-cost solution of a network flow problem. 

57 Claims, 5 Drawing Sheets 
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DATA DISTRIBUTION TECHNIQUES FOR redirection, the increase in throughput is still limited by the 

LOAD-BALANCED FAULT-TOLERANT WEB dynamic redirection and the need to go over the network to 

ACCESS fetch documents. Furthermore, failures are still a problem 

due to the use of DNS name caching. 

HELD OF THE INVENTION s The "One-IP" approach described in 0. P. Damani, P.-Y 

The present invention relates generally to server systems Chung, Y. Huang, C. Kintala, Y.-M. Wang, "ONE-IP: Tech- 

for use in processing client requests received over commu- niques for Hosting a Service on a Cluster of Machines," 

nication networks such as the Internet and more particularly Sixtn International World Wide Web Conference, Santa 

to server-side techniques for processing client requests in a Clm, April 1997, and U.S. patent application Ser. No. 

server system. 10 08/818,989 filed Mar. 14, 1997, distributes requests to 

different servers in a cluster by dispatching packets at the 

BACKGROUND OF THE INVENTION Internet Protocol (IP) level. A dispatcher redirects requests 

w *i . « • i ■ , . to the different servers based on the source IP address of the 

Many server-side techniques have been proposed to 4 ~ ^ T ~ , ., , , , 

♦u i_ * j * l'i • j c u j 4 client. I ne One-IP approach provides a low-overhead scal- 

mcrease the throughput and scalability of web servers and to . j « , . . . . * 

A \ , . * i- * t i 1 aD l e solution, but a potential drawback is that the load may 

decrease the request latency for clients. In an exemplary . , ' , \ , . . 4 , - 

■ , . ? . j j • ■» «■ m -wr \ i not be optimally balanced if arriving requests do not have 

server-side technique described in M. T. Kwan et al, T „ , , tU . f, M 

ffXIPC , , „, m , „ „ . , source IP addresses thai are reasonably random. 

"NCSA s World Wide Web Server: Design and 7 

Performance," IEEE Computer, pp. 68-74, November 1995, ^ a PP ro ach referred to as "TCPRouter" is described in 

independent web servers use a distributed file system known M D - Dias et ^ "A Scalable and Highly Available Server/' 

as Andrew File System (AFS) to access documents COMPCON '96, pp. 85-92, 1996. This approach publicizes 

requested by the clients. A round robin Domain Name the address of the server side router which receives the chent 

Service (DNS) is used to multiplex requests to the web requests, and dispatches the request to an appropriate server 

servers. In this server system architecture, although the based on load information. The destination address of each 

throughput is increased by balancing the load across the « IP address is changed by the router before dispatching. This 

servers through multiplexing, a high degree of load balance means ^at the kernel code of every server in the cluster 

may not be achieved due to DNS name caching at different needs t0 be modified, although the approach can provide 

places in the network. This DNS name caching will also fault-tolerance and load balancing in certain applications. 

prevent the clients from tolerating server failures. A number of other server-side techniques are based on 

Another approach that uses AFS is described in M. 30 cach ing or mirroring documents on geographically distrib- 

Garland et. al., "Implementing Distributed Server Groups uted sites * See > for example, J. Gwertzman and M. Seltzer, 

for the World Wide Web," Technical Report CMU-CS-95- ' <Trje Casc for Geographical Push-Caching," HotOS '95, 

114, School of Computer Science, Carnegie Mellon 1995, A. Bestavaros, "Speculative Data Dissemination and 

University, January 1995. In this approach, a front-end Service to Reduce Server Load, Network Traffic and Service 

server, called a dispatcher, is used to dispatch a request to 35 Time m Distributed Information Systems," Proceedings of 

one of a number of back-end document servers. The dis- tnc International Conference on Data Engineering, March 

patcher monitors the load on the document servers and based 1 996 > and A < Heddaya and S. Mirdad, " WebWave: Globally 

on this information determines which server should service ^ad Balanced Fully Distributed Caching of Hot Published 

a given incoming client request. The document servers have Documents/' Computer Science Technical Report, BU-CS- 

access to all the requested documents by using the AFS. 40 96 ' 024 > Boston University, October 1996. These techniques 

Unfortunately, these and other approaches based on AFS are are generally referred to as geographic push caching or 

limited by the need for the web servers either to go across server side caching. Client requests are sent to a home server 

the network through the file servers to fetch the document, which then redirects the request to a proxy server closer to 

as in the NCSA server, or to store all the documents locally. the client. The redirection can be based on both geographic 

The SWEB approach described in D. Andresen et al., 45 Proximity and the load on the proxies. Dissemination of 

"SWEB: Towards a Scalable World Wide Web Server on document information from the home server is used to keep 

Multicomputer^ Department of Computer Science Tech ^e caches consistent These techniques are limited in their 

Report-TRCS95-17, U.C. Santa Barbara, September, scalabihty because of the need for keeping caches consis- 

1995, uses distributed memory machines and a network of J ent ; ^t^rmore, the r load balan ™g achievcd ™V b ^ 

workstations as web servers. All the servers do not locally 50 hmi L ted J t f the locatlon ^formation of the document is cached 

store all the documents, but can instead go over a LAN to at die client Fault-tolerance is also an issue as it will be 

fetch documents that are requested but are not locally dlfficult for ^ home MTWX to kee P dynamic information 

available. At the front end, a round robin DNS is used to about m fault y- 
direct a request to one of the web servers. This web server 

then uses a pre-processing step to determine whether the 55 

request should be serviced locally or should be redirected to The invention provides improved server-side techniques 

another server. The redirection decision is made based on a for processing client requests received over the Internet and 

dynamic scheduling policy that considers parameters such as other communication networks, without the problems asso- 

CPU load, network latency and disk load. If a decision is ciated with the above-described conventional approaches, 

made to service a request locally, and if the document is not 60 An illustrative embodiment is a scalable and fault-tolerant 

available locally, an appropriate server is chosen from which web server system which utilizes HTTP redirection. The 

the document is fetched. If a decision is made not to service system includes of a set of N document servers and one or 

the request locally, another server is chosen and the client is more redirection servers which receive HTTP requests from 

redirected to that server using HTTP redirection. This sys- clients and redirect the requests to the document servers in 

tem is scalable and does not require each server to locally 65 accordance with pre-computed redirection probabilities. A 

store all documents. Although this system alleviates the load distribution algorithm is used for initial distribution of 

problem of DNS name caching through the use of server a set of documents across the servers and determination of 
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the redirection probabilities. Given a specific degree of on the Internet. The HTTP protocol is described in greater 
document replication k, the load distribution algorithm detail in "Hypertext Transfer Protocol — HTTP/1.0," Net- 
ensures that at least k replicas of each document are present work Working Group, May 1996, <http://www.ics.uci.edu/ 
after document distribution is complete. The algorithm can pub/ietf7http>, which is incorporated b y refe rence herein, 
also ensure that for all the documents combined no more 5 For example, a client may generate an HTTP request for a 
than N-l redundant replicas will exist in the system. The particular service hosted by the server system 10, such as a 
redirection servers redirect requests to one of the replicas request for information associated with a particular web site, 
with the corresponding redirection probability. The load ^ d a T ? P/IP coaQect i°a * Jf™ established between the 
distribution algorithm together with this redirection mecha- chent and a P^^F 0De of document servers 16 in the 

nism ensures that the load is properly balanced across the N 10 * crver m 10 ™° rc£ * Ucstcd nctwo * w ™£ b ? 

, designated by a uniform resource locator (URL) which 

document servers. * i j j ■ -j • *u * m 

includes a domain name identifying the server system 10 or 

In embodiments which utilize replicated copies of a particular one of the document servers 16 hosting the 

documents, the redirection probabilities may be recomputed service. The DNS server 12 maps a domain name in a client 

periodically using an algorithm based on network flow. For request to an IP address of the appropriate server in the 

example, if a given document server fails, the redirection 15 system 10. 

server can use recomputed redirection probabilities to ensure Each of the document servers 16 in the illustrative 

that the load can be approximately balanced among the embodiment of FIG. 1 may be an HTTP server that manages 

remaining servers without any need for moving documents a set of documents locally and can service client requests 

among them. This allows for graceful degradation of service only for the locally-available documents. The redirection 

in the event of server failure. The redirection probabilities 20 servers 14-1 or 14-2 redirect a request to a document server 

may also be recomputed in the event of other changes only if a copy of the document is available at that server, 

affecting the server system, including changes in document tne document servers 16 are independent and need not 

access probabilities and changes in server capacities. mcur mc overhead required to collaborate and retrieve a 

document over a local area network, as in conventional 

BRIEF DESCRIPTION OF THE DRAWINGS 2 5 approaches such as the S WEB system described previously. 

nrp -,. ut i n * *■ t Also, because each document server S, serves only a subset 

FIG. 1 is a block diagram illustrating an exemplary riL , , , t t , , ' J , .,. 

scalable and fault-tolerant web server system in accordant ° f ^ available document, the document server cache will 

J be utilized more efficiently than in a system in which each 

with the invention: , A . L1 J c . , A A ... 

document server is capable of serving all documents. As will 

FIG. 2 illustrates the operation of a portion of the web 3Q be described in detail beloW) tne server system io 

server system of FIG. 1 as applied to an exemplary set of makes ^ of a document distr ibution algorithm and a 

documents and client requests; and redirection mechanism to balance the request load across the 

FIGS. 3, 4, 5 and 6 show flow network diagrams of document servers 16 and to also provide for fault -tolerance 

examples illustrating the operation of the web server system and graceful degradation of performance through replica- 

of FIG. 1. 3S tion. It should be noted that alternative embodiments of the 

invention may be configured such that the documents served 

' INVENTODN^ WA ""^ b ^ a ^ VCD document are not available locally on that 

server. For example, the subset of the available documents 

The present invention will be illustrated below in con- served by the given server may be retrieved from a file 

junction with exemplary client/server connections estab- 40 server accessible over a local area network, 

lished over the Internet using the Transmission Control The round-robin DNS 12 of server system 10 multiplexes 

Protocol/Internet Protocol (TCP/IP) standard. It should be client requests among the redirection servers 14-1, 14-2 such 

understood, however, that the invention is not limited to use that a single redirection server does not become a bottleneck, 

with any particular type of network or network communi- Other types of DNS techniques may also be used. The 

cation protocol. The disclosed techniques are suitable for use 45 redirection servers 14-1, 14-2 redirect the incoming chent 

with a wide variety of other networks and protocols. The requests to a document server S, that maintains the requested 

term "web" as used herein is intended to include the World document. Since the documents are permitted to be 

Wide Web, other portions of the Internet, or other types of replicated, more than one document server S t may be a 

communication networks. The term "client request" refers to candidate for servicing the client request. The redirection 

any communication from a client which includes a request 50 servers 14-1, 14-2 use a redirection mechanism to be 

for information from a server. A given request may include described below in order to determine which document 

multiple packets or only a single packet, depending on the server S t - should service a particular request. The redirection 

nature of the request. The term "document" as used herein mechanism may be based on the HTTP protocol and thus 

is intended to include web pages, portions of web pages, can be supported by most currently available browsers and 

computer files, or any other type of data including audio, 55 servers. The redirection mechanism may use the access rates 

video and image data. of the documents as a metric for partitioning the documents 

FIG. 1 shows an exemplary web server system 10 in across the document servers. In one possible 

accordance with an illustrative embodiment of the invention. implementation, it attempts to equalize the sum of the access 

The server system 10 includes a round-robin domain name rates of all the documents stored at a given document server 

service (DNS) server 12, a pair of redirection servers 14-1 60 across all document servers. It should be noted that this 

and 14-2, and a cluster 16 of N document servers S v S 2 , . . . redirection mechanism could be implemented at a higher 

Sjy interconnected as shown. The server system 10 commu- level of granularity, in which directories are distributed 

nicates with one or more clients over TCP/IP connections instead of documents, and access rates of these directories 

estab lished over a network in a conventional manner. Each are balanced across a set of servers. As noted above, the term 

of the elements of server system 10 may include a processor 65 "document" is intended to include without limitation any 

and a memory. The system 10 is suitable for implementing type of data that may be distributed across multiple servers 

Hypertext Transfer Protocol (HTTP) -based network services in a server system. 



DETAILED DESCRIPTION OF THE 



02/13/2004, EAST Version: 1.4.1 



6,070,191 

5 6 

As noted above, an illustrative embodiment of the inven- one of its phases to distribute the documents such that local 

tion utilizes access rate as a metric to balance load in a server balance on replicas of each document is obtained, i.e., the 

system. The access rate metric is particularly we 11 -suited for access rates of the replicas arc made equal It is believed that 

applications in which the document servers are dedicated to this property enables graceful degradation of performance 

providing web service. In such a system, balancing the load 5 and load in the event of failures. Similar to the binning 

based on conventional metrics such as central processing algorithm, the two-phase algorithm may lead to the possi- 

unit (CPU) queue length, disk usage and the like may not bility that redundant replicas are distributed. It can be shown 

provide adequate load balancing. Given that each of the that the number of redundant replicas for all of the docu- 

docuraent servers can generally support a limited number of ments combined can be no more than N-l, which implies 

TCP connections, in a server system where a large number 10 that at most N-l documents may have more than k replicas 

of HTTP requests are expected, it is generally more appro- in the system. The initial distribution algorithms are comple- 

priate to have as a metric, the minimization of the probabil- mented by the redirection mechanism at the redirection 

ity that a request fails because the corresponding document server to achieve load balance. 

server is at its peak TCP connection load. For example, As noted above, for a heterogeneous server cluster, 

assume that 1 is the maximum number of TCP connections 15 requests for a document may be redirected by the redirection 

that a document server can support. If K is the rate at which server to only one replica and for a homogeneous cluster 

HTTP requests come into the redirection servers), the requests may be redirected such that the replicas of the 

documents should be distributed across the N back-end documents have equal access rates. This means that, for each 

document servers such that the rate of request redirected to document, the redirection server needs to maintain informa- 

each server is approximately X/N. 20 tion about the location of the replicas and then redirect a 

In general, the document distribution problem can be request to the relevant replica based on the probabilities 

characterized as follows: Given a set of M documents witb determined by the initial distribution algorithm to be 

access rates r lf r 2 , . . . , x M (total access rate r=rj+r 2 + . . . r^) described below. In embodiments including multiple redi- 

and N servers S lf S 2 , . . . , S N which can support a maximum rection servers, the redirection servers may each maintain 

number of simultaneous HTTP connections \ lt 1^, . . . , 1^, 25 the same probability information for load balance across the 

respectively (total maximum number of connections l=l t + document servers, assuming that the round robin DNS 12 

I2+ . . - Ijv), and the requirement that each document be balances the load across the redirection servers, 

replicated on at least k document servers, distribute the The server system 10 of FIG. 1 can be configured to 

documents on the servers such that R,«rxl/1 where R ( is sum provide graceful performance degradation in the presence of 

of the access rate of all documents on document server i. 30 document server failures with minimum reconfiguration of 

Thus, the documents are distributed such that the load on the system. For example, in embodiments in which docu- 

each server is proportional to its capacity in terms of the ments are replicated and therefore available on more than 

maximum number of HTTP connections that it can support one server, the redirection mechanism noted above can be 

simultaneously. If l^L,- ... 1^, all the document servers in used to configure the server system such that the initial 

the cluster have the same capacity and such a cluster is 35 distribution of the documents need not be changed and no 

referred to as homogeneous, as opposed to a heterogeneous document movement is required. When a given document 

cluster where the server capacities are non-uniform. In the server fails, a network flow based algorithm may be used to 

homogeneous cluster case, the blocking probability is mini- recompute the redirection probabilities to each of the repli- 

mized if the access rates are made equal across all servers, cas of each of the documents in order to approximately 

assuming that the average document size on a server is the 40 rebalance the load across the remaining document servers. It 

same across all the servers. If the document sizes are varied, should be noted that the server cluster may have some of the 

then the average connection time per request will be differ- documents replicated less than k times, because the illus- 

ent among requests and hence the length of the documents trative system is not configured to generate and place new 

may be taken into account by defining the "access rate" to copies of documents that were on a failed server, 

a document i as r~S,* 8 ( - where S, is the size of the document 45 Furthermore, the above-noted properties that only one rep- 

and 6,- is the rate of request to the document. lica will receive all requests for a document (in a heteroge- 

An algorithm to be described below, referred to as the neous cluster) or that replicas of a document have equal 

"binning" algorithm, may be used to provide an initial access rates (in a homogeneous cluster) will no longer hold 

document distribution that achieves load balance in a het- true in the event of server failure. However, once the failed 

erogeneous server cluster with a value of k=l. This algo- 50 server is repaired, the initial redirection probabilities can be 

rithm may also be used as a basis for algorithms for values reinstated. 

of k>l. In order to balance the access rates, the binning The operation of server system 10 will first be described 
algorithm may create more than k replicas of a given for the case of a heterogeneous cluster with the requirement 
document. However, it will be shown that the total number that at least one copy of each document (i.e., k-1) is 
of redundant replicas that the algorithm creates over all 55 available in some document server S,- in the cluster 16. As 
documents can always be made less than or equal to N-l. described earlier, the cumulative access rate of document 
The binning algorithm can be extended to the case of k>l by server S,., should be R,=rxl/1 after document distribution, 
distributing an extra k-1 "dummy" replicas of each docu- The binning algorithm operates as follows. A random docu- 
ment across the documents servers. However, HTTP ment server S f , is picked and a random document j is picked 
requests only go to replicas which were created initially, and 60 and placed on that server. After the access rate of the 
no requests are actually serviced by the dummy replicas. document is mapped to the document server, if there is still 
Let the access rate of the k replicas of document j be some residual capacity left in the server, i.e., its maximum 
denoted by r/, r, 2 , . . . , ry*. Then for each document, only access rate (rxl/l) is not reached, then another document k 
one of its replicas, referred to as the master replica, will is randomly picked and is placed on this server. This process 
service all requests; i.e., r/-r,- and r/ 2 -. . .-r/-0. For the case 65 continues until the total access rate to the document server 
of a homogeneous server cluster, the invention provides a S, upon the placement of a document m exceeds the maxi- 
two-phase algorithm which uses the binning algorithm in mum access rate (rxl/1). At this stage, another document 



02/13/2004, EAST Version: 1.4.1 



6,070,191 



server is randomly picked and document m is replicated on 
that server. The portion of access rate 6 of document m that 
could not be mapped to server S,- is mapped onto this other 
server. Thus, if each of the document servers are considered 
to be a bin with a certain capacity, then the binning algorithm 5 
fills each bin completely before another bin is chosen to be 
filled. An exemplary set of pseudo-code for the above- 
described binning algorithm is shown below. 



Let U e {1,2, . . . , M} be a set of documents and 

s - s u s^ 



. the set of servers. 



10 



Let R| and \ be the cumulative access rate and capacity of Si respectively. 
Let r ( be the total access rate to document L 
Randomly choose j in set U; 

6 o rj ; 15 
while S is not empty { 

Randomly choose S, in set S; 

R, - 0; 

pic_doc: If (6 — 0) then { 

Randomly choose j in set U; 

Place j on S t ; 20 

... 

Else 

Place j on S t ; 

If(Ri + 6 > r x tyl) then { 

6 = 6 - (r x l^-Rj); 25 

Ri - r x li/1; 

5 - S - {SJ; 

} 

Else if (Ri + 6 r x then { 
Ri - r x 

6 " 0; 30 

s = s - {SJ; 3U 

U - U - {j}; 

} 

Else if (R t + fi < r x VI) then { 
Ri « Ri + 6; 
6 = 0; 

U - U - {]}; 35 
go to pic_doc; 

} 

}end while 



8 



each of the documents randomly onto the document servers 
without directing any load to them. These replicas are the 
above-noted "dummy" replicas that are used only in the 
event of a document server failure. When failures occur, 
these replicas may be made "active" in order to rebalance the 
load, as will be described in greater detail below. 

A homogeneous server cluster is a special case of the 
heterogeneous cluster and can use the above-described bin- 
ning algorithm to achieve a balanced load. However, the 
binning algorithm generally does not achieve local balance 
as defined previously. An exemplary two -phase algorithm 
that achieves local balance for homogeneous clusters will be 
described below. This two-phase algorithm uses the binning 
algorithm in its second phase. 

Let the access rate of document i be r, and the total access 
rate of the M documents be r=r 1 +r 2 + . . . i M . Consider k 
replicas of document j. The k replicas are referred to as 
"copies" of the document; the copies may also be replicated 
and will be referred to as "replicated copies. " Let the access 
rate of copy j of document i be r/. Assume the access rate ry 
of document j is divided equally among its copies. Arrange 
the document copies in ascending order according to their 
access rates. Assume without loss of generality that r/^ 2 - 

r *<r 1 =t 2 = ' ' 

• * - r i = r 2 =I 2 



y: 



. . . =r M k , where 
_ , and so on. The 
goal of the two -phase algorithm is to distribute these docu- 
ments such that after distribution, R,-R 2 - . . . R^-r/N. The 
with copy of thejth document is denoted by j'. The pseudo- 
code for the two-phase algorithm is shown below. It should 
be noted that the two-phase algorithm could be modified in 
a straightforward manner to allow for different documents to 
have different replication requirements, i.e., k could be 
different for different documents. For simplicity of 
illustration, the description will assume that k is the same for 
all documents. 



In the above-described binning algorithm, when a docu- 
ment is placed on a document server S 4 -, either the entire 
access rate of the document can be completely mapped onto 
that server without exceeding the server's capacity, or the 
server's cumulative access rate will exceed the capacity, in 
which case the document copy is replicated on the next 
document server. Intuitively, the placement of a document 
on a server can either exhaust the document access rate or 
exhaust the server. A document is replicated only when a 
server is exhausted without being able to accept the access 
rate of that document completely. Each server can be 
exhausted only once and each exhaustion does not neces- 
sarily result in production of a replica. Therefore, in a system 
with N document servers, there can be at most N replica- 
tions. However, when the last document server is exhausted, 
all the access rate of the document which is currently being 
mapped to it gets used up, since the sum of the capacities of 
the servers equals the sum of the access rates r of all 
documents in this embodiment. Exhaustion of the last server 
therefore does not produce a replica, such that the maximum 
number of replications is reduced from N to N-l. 

In the case of a heterogeneous cluster with the require- 
ment that more than one copy of each document (i.e., k>l) 
is available in some document server S f in the cluster 16, the 
above-described binning algorithm can be first used to place 
at least one copy of each document on some document 
server in the cluster in order to achieve load balancing for 
the single copies, and then used to distribute k-1 replicas of 



4Q Let U be the set of documents replicated and arranged as described above. 
Let S be the set of servers where the servers are randomly ordered as S = 

S 1> S 2> • • • Sn* 

Let CPj d be the cumulative access rate of server S £ after d documents have 
been placed on it. 
Phase 1: 
d-1; 

45 f or all i from 1 to N set CP° = 0; 
Start: In round d { 

Place the next N document copies from U in order on the N servers 
in S and while placing a document j 1 on S w do the following: { 
If (CP/' 1 +■ r/ < r/N) then { 
Place document j 1 on S e ; 

50 U - U - {j 1 }; 

CP e d - CP/" 1 + r/; 

Else 

CP e d - r/N; 

6 - r/ - (r/N - CP,*- 1 ); 
D-d; 

55 exit Phase 1; 

} 1 
d = d + 1; 
go to Start; 

} 

„ End of Phase 1. 
60 Phase 2: 

Order the servers in the order S » {S e+1> S e «. 2 , .... S^S^ . . . S^J; 
while S is not empty { 

Choose the next server Si in set S; 
Ri »CP; D ; 
pic„doc: If (6 =~ 0) then { 
65 Choose next document k starting from in the same order as 

in Phase 1; 
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•continued 

Place this document on S { ; 

a-r k - V* 1 ; 

Else 

place r/ on 
If (Ri + 6>i x lyi) then { 
6 - 6 - (r x VI -RJ; 
Ri -r x VI; 
S = S - {SJ; 

} 

Else if (R ( + 6 =r x then { 
Ri =r x 1/1; 
6-0; 

5 - S - {Sj; 
U ~ U - {k}; 

} 

Else if (R t + 6 < r x lj/1) then 
R t =R, + 6; 

6 = 0; 

U = U - {k}; 
go to pic_doc; 

} . 
end while 
End of Phase 2. 



In the first phase of the two-phase algorithm shown above, 
a round-robin technique is used. This phase of the algorithm 
consists of a number of rounds. In the first round, the first N 
document copies are placed on the first N document servers; 
in the second round, the next N document copies are placed 
on the N document servers, and so on. Each time a document 
is placed on a document server, the cumulative access rate 
of all the documents on the servers is calculated. After d 
documents have been placed on a server (i.e., after d 
rounds), the cumulative access rate of document server S t is 
CP/. Given that the documents are placed in ascending 
order of their access rates, CP/^CP 2 J . . . ^CP^CP/^ 
, . . , the first time the access rate of a server S ( * exceeds the 
maximum allowed for that server (r/N), the first phase ends. 
Assume that this happens at round D. Let the server where 
this happened be S e and the document copy which made this 
happen be Assume that the access rate of j,- that could not 
be directed to S e , without the cumulative access rate of the 
server exceeding r/N, be 6. Thus, at the end of the first phase, 
cp d ±cpd ^ (CP <J z> =r/N)>CP e+/ °- / § . . . C?f- 
i=CY?j D . At the end of the first phase, only server S e has 
reached its maximum cumulative access rate and the cumu- 
lative access rates of all the servers are in circularly ascend- 
ing order starting from server S_j. Document copies j 1 , 
. . . , ] k , . . . , M 1 , M 2 , . . . , still need to be distributed. 
In the round-robin scheme, replication of a given document 
copy takes place only if a server exceeds its capacity when 
this document copy is placed on that server. Since the first 
phase ends when any server exceeds its access rate, there is 
no replication of document copies during this phase. 

In the second phase of the two -phase algorithm, the 
remaining document copies are distributed using the binning 
algorithm on the residual capacities of the servers. Docu- 
ment copies are placed on a server until a cumulative access 
rate of r/N (i.e., the maximum allowed access rate) is 
reached before being placed on the next server. Thus, 
document copy j* is replicated on S e+1 and its remaining 
access rate 6 is mapped to this server. If the cumulative 
access rate of S, +1 exceeds r/N because of this mapping, 
then j 1 is replicated on and the remaining access rate is 
mapped to this server, and so on. Since the binning algo- 
rithm is used in the second phase, it follows from the above 
description of the algorithm that when a document copy is 
placed on a server, either all its access rate can be completely 
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mapped onto that server without the cumulative access rate 
of the server exceeding r/N, or the cumulative access rate 
will exceed r/N, in which case the document copy is 
replicated on the next server. Again, this means that place- 

5 ment of a document copy on a server in the second phase 
either exhausts the access rate of the document copy or 
exhausts the server. Following arguments from the discus- 
sion of the binning algorithm, not more than N-l "redun- 
dant" replicas of document copies will be placed during the 

10 second phase; this together with the fact that there is no 
replication of document copies during the first phase means 
that at most N-l documents will have more than k replicas 
placed on the servers in the cluster. It is possible that two 
copies of the same document may be placed on the same 

1S server in which case these copies will be merged to form a 
single copy. It will be shown below that after distribution at 
least k copies of a document wiU be available in the server 
cluster. 

The second phase of the algorithm starts only when some 

20 document server exceeds its capacity and because of the 
cyclical ascending nature in the way servers are exhausted, 
it can be seen that not more than one document copy can be 
completely exhausted in a given server in this phase. That is, 
two document copies cannot be completely exhausted in a 

25 given server, which implies that at least one of them will be 
replicated in another server. As a result, even if two docu- 
ment copies are merged together, one of them will regen- 
erate another copy on another server and hence the goal of 
distributing at least k document copies will still be satisfied. 

30 At the end of the second phase, each server will have a 
cumulative document access rate of exactly r/N in this 
embodiment. A number of examples will be provided below 
to illustrate the above-described distribution algorithms and 
redirection mechanism. 

35 FIG. 2 shows a first example of a server system including 
a single redirection server 14-1 and a heterogeneous server 
cluster 16 with three document servers S l7 S 2 , S 3 and five 
documents designated 1, 2, . . . , 5 distributed across the five 
documents servers as shown. Assume that the access prob- 

40 abilities of the five documents 1, 2, . . . , 5 are 0.35, 0.5, 0.05, 
0.04 and 0.06, respectively. The access probability in this 
example is the probability that when an HTTP request 
arrives at the redirection server 14-1, it is a request for the 
corresponding document. The access probabilities can be 

45 obtained by scaling the access rates. Each document is 
replicated once (i.e., k=l). Also assume that the scaled 
capacities (1/1) of the document servers Sj , S 2 , S 3 are 0.3, 0.6 
and 0.1, respectively This is the target load that should be 
reached on each server in order to achieve load balance. The 

50 binning algorithm chooses documents and servers at ran- 
dom. For this example, assume that the documents are 
picked in the order 1 through 5 and the servers are also 
picked in numerical order. First, document 1 is placed on S a . 
The capacity of Sj (0.3) is exceeded and hence 1 is repli- 

55 cated on S 2 and the residual access probability of 0.05 is 
mapped to that server. Now, document 2 is placed on S 2 and 
is exhausted. Next, document 3 is placed on S 2 and with this 
placement, both S 2 and document 3 are exhausted. Then, 
both documents 4 and 5 are placed on S 3 thereby exhausting 

60 both the documents and the server. This distribution leads to 
only document 1 having more than one copy. Hence, there 
is only one redundant replication. Note that the theoretical 
upper limit for the number of redundant copies in the system 
is N-l =2. 

65 If the example is altered such that k>l, then the extra 
replicas that are required can be distributed as dummy 
replicas randomly on the document servers, without repeat - 
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ing a document copy on a server where it already exists. 

These dummy replicas have zero access probabilities. TABLE 1 
Instead of random distribution, the dummy replicas could 
also be distributed to balance other parameters such as 
space. For example, if k=2, then dummy replicas of docu- 5 
ments 2, 3, 4 and 5 (there are already two copies of 1 in the 
system) can be distributed such that 4 and 5 are placed on S x 
and 2 and 3 are placed on S 3 . This way, Sj and S 2 will store 
three document copies and S 3 will store four document 30 
copies. The redirection mechanism will now work as fol- 
lows. Requests to document 1 will be redirected to with ^ load balance that is achieved through initial distri- 
probability 0.3/0.35=0.86 and to S 2 with probability 0.05/ bution of thc documents may be disturbed during the opera- 
0.35-0.14. All requests to documents 2 and 3 will be Uon of * e *? ter due * ' the foMowing changes: (1) 
redirected to S 2 and all requests to documents 4 and 5 will * »*™Mii«; (2) changes m the access probabihUesoi the 

• „ ™ . . . documents, i.e., some documents which were initially hot 

be redirected to S 3 . Tins redirection mechanism is illustrated cou[d become an£} ^ce^eisa, ^ (3) changes in tne 

in FIG. 2 by redirection probabilities assigned to the inter- capacity of the document servers. In the case of dedicated 

connections between the redirection server 14-1 and the W eb servers, the first and second changes are generally more 

document servers S i? S 2 and S 3 . 2Q likely than the third. In the event any of these changes occur, 

a ,i , . , , , it is desirable to be able to rebalance the server loads without 

As another example consider a homogeneous system ^ major rcconfiguration of ^ tem> Morc spccifically , 

with the same server configuration as in the example of FIG. lhe rebalancing snould be accomplished without redistrib- 

2 but with a value of k-2. Assume again that there are five uting documcnts ^ thc scrverS) since this would gcn . 

documents 1, 2 , . . . , 5 with the same access probabilities ^ era]ly j nvo lve overhead in moving the documents between 

as above. Each document server S u S 2 and S 3 has a scaled servers and may affect the availability of the system. It is 

capacity of 0.333. In the first phase, the round-robin tech- preferable to instead achieve rebalance by adjusting only the 

nique orders the document copies in ascending order of their redirection probabilities used by the redirection server. As 

access probabilities, to produce an order of 4 1 , 4 2 , 3 1 , 3 2 , 5\ will be shown below, this rebalancing can be characterized 

5 2 , l\ l 2 , 2\ 2 2 . Note that in this example the access 30 as a network flow problem. It should be noted that the 

probability of a document will be equally split between the network flow approach is suitable for use in situations in 

copies of the document for local balance. In the first round, which documents on a failed server are replicated and 

documents 4 1 , 4 2 , 3 1 are placed on document servers S 2 therefore available on another server or servers. In situations 

and S 3 respectively. The cumulative probabilities after the in which the documents on the failed server are not 

first round are CP/-0.02, CP 2 1 =0.02 and CP 3 1 =0.025. In 35 replicated, the above-described binning algorithm may be 

the second round, documents 3 2 , 5 1 and 5 2 are placed on used t0 redistribute the documents from the failed server to 

document servers S 3 , S 2 and S 3 , respectively. Now, the achieve rcbalancc with a minimal amount of document 

cumlative probabilities become CP^O.045, CP 2 2 =0.05 and movement. 

CP 3 2 =0.055. In the third round, documents l\ l 2 and 2 1 are FIG * 3 shows a flow network diagram characterizing the 
placed on document servers S u S 2 and S 3 , respectively. 40 mitial document distribution in the above-described two- 
Then, CP, 3 ~0.220, CP 2 3 =0.225 and CP 3 3 =0.305. In the P hase exam P le ™& k = 2 - The A° w network is of a type 
fourth round, document 2 2 is first placed on document server ^ scrib ^ m > fo A r t cxample > R : A K ' ^ ct „ *J- " Nct ^ 0 ^ 
S, Server S, is exhausted after 0.113 access probability of ^ry, Algonmms and ApphcaUons, Prentice Hall, 
A . \ , ™. 1 j *u « ♦ u 1 ,1 1993, which is mcorporated by reference herein. The flow on 
2^ is mapped to it. This concludes the first phase. In the F / , J . , , 

, V , , . . , . . „2 . 45 the arcs from the documents to the servers show how the 

second phase which uses the binning algorithm, 2 is access babilities of me ^ nnt documents m mapped 

replicated on S 2 and 0.108 of the remaining access prob- t0 me The flows Qn ^ arcs tom tfae {o ^ 

ability of 2 (which is 0.137 at this point) is mapped to S 2 . ^ which are equal t0 the capacities ^ constra ined by the 

Thus exhausts S 2 . Document 2 is again replicated on S 3 and mitial distribution algorithm, show that the load is balanced. 

the remaining access probability of 0.029 is mapped to S 3 . 50 The costs on all the arcs except the ones marked "high" are 

This exhausts both 2 2 and S 3 . Note that at this point, S 3 equal to zero. The redundant arcs from the servers to the sink 

already contained 2 1 . This copy is merged with the copy of represent arcs that have infinite capacity but also have 

2 2 and their access probabilities are combined (0.25+0.029- "high" cost. These are arcs through which no flow takes 

0.279). place at present but will be used for excess flow when a 

Aft .u u j -u j j- . -i. o . ■ 55 change occurs in the system. The document distribution 

After the above-described distribution, server St contains & . , . - s . a 

c , „ - „ , . * t . represented in FIG. 3 is a maximum-flow minimum-cost 

copies or documents 1,2,3 and 4, server contains copies \ t - t tU a- a *, 1 ui 

/. A „ M \ ' * 2 . . - solution to the correspondmg flow network problem. 

01 documents 1,2,4 and 5 and server So contains copies of A • a * 1 J- n . < • 

% .-.o j^tlul JL *. . FIG. 4 is a flow network diagram illustrating a rebalanc- 

documents 2,3 and 5. It should be noted that there is one m k &e {Q a ^ ^ ^ ^ ^ em 

redundant copy of document 2. Although copies of docu- 6Q rcprcscntcd by me flow network & of nG 3 It will 
ment 2 are merged in server S 3 , still there are at least two 5e assumed for ^ reb alancing example that server S 3 fails, 
copies of 2 in the system. The redirection mechanism will goa] of lhc reba lancing process is to recompute the 
use the probabilities shown m TABLE 1 to redirect requests fl ows on the arcs between the documents and the servers 
to the different copies. TABLE 1 is organized such that an from which the new redirection probabilities can be corn- 
entry for row S t and column one in the table indicates the 65 pu ted. Because S 3 has failed, there cannot be any flow 
probability with which the request for document 1 will be redirected towards that server and hence the costs of the arcs 
directed to server Si. from this server to the sink are made "very high" (i.e., >high) 
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as shown in FIG. 4. To achieve load balance, the capacities 
of the remaining two servers are made 0.5 each. The 
resulting flow network problem is solved to find the 
maximum-flow minimum-cost solution. This solution may 
be obtained using the mathematical programming Language 
AMPL as described in R. Fourer et. al., "AMPL: A Modeling 
Language For Mathematical Programming/' The Scientific 
Press, 1993, which is incorporated by reference herein, in 
conjunction with a linear program solver such as MINOS. 

As in the FIG. 3 diagram, all of the arcs except the ones 
marked "high** and "very high" have zero cost. Hence, as 
much flow as possible is pushed through those arcs from the 
servers to the sink that have capacities 0.5. If the flow cannot 
be made equal to the capacity on this arc (e.g., from S 2 to the 
sink), only then there will be excess flow on the "high" 
capacity arcs (in this case, from S, to the sink). Of course, 
there will be no flow on the arcs with "very high" cost. The 
solution to this flow network problem will provide the flows 
on the arcs between the documents and the servers. 

Note that the flow on all the arcs coming into S 3 will 
evaluate to zero. It is possible that there does not exist a 
solution that provides exact load balance. In that case, the 
solution obtained may alternatively provide the load balance 
property that the sum of the variation in the load on all the 
servers is minimized. This property may be provided by the 
network flow algorithm used and is referred to herein as 
"approximate" load balance. Requiring that documents not 
be redistributed after a server failure may limit the achiev- 
able load balance to "approximate" load balance in certain 
applications. 

A solution obtained for the FIG. 4 flow network problem 
using the above-noted AMPL language and the MINOS 
linear program solver is shown as flows on the arcs between 
the documents and the servers in FIG. 4. The solution shown 
provides exact load balance in this example, although it 
should be noted that the local balance property for the 
document copies are no longer satisfied. The new redirection 
probabilities for the rebalancing are shown in TABLE 2 
below. 

TABLE 2 

Redirection Pro b abilities /or Rebalancing Example of _ FIG. m 4 

Document Document Document Document Document 
1 2 3 4 5 

S t 1 0.06/0.5 = 0.12 1 1 

S 2 0 0.44/0.5 - 0.88 0 1 



If the access rates of some of the documents change, new 
redirection probabilities can again be computed by formu- 
lating the problem as a maximum-flow minimum-cost prob- 
lem. Consider the original configuration described in con- 
junction with FIG. 3 and assume that the access probabilities 
of documents 1 and 3 are changed as shown in the flow 
network of FIG. 5. If this flow network is solved for 
maximum flow with minimum cost, the solution will pro- 
vide the flow on the arcs between the documents and the 
servers from which the new redirection probabilities can be 
computed. The solution is shown in FIG. 5 and it can be seen 
that this solution achieves exact load balance. The new 
redirection probabilities are shown in TABLE 3 below. 
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FIG. 6 shows a flow diagram for another example in 
which the capacities of the servers S l9 S 2 and S 3 have been 
changed to be 0.2, 0.05 and 0.75 respectively. The changes 
in the capacities are seen on the arcs from the servers to the 
sink. Again, a maximum-flow minimum-cost solution will 
provide the flow on the arcs from the documents to the 
servers from which new redirection probabilities can be 
calculated. The solution for this example is shown on the 
document -server arcs in FIG. 6. On the arcs between the 
servers and sink, the flow obtained is shown in parenthesis. 
Unlike the previous two examples, in this case the solution 
is able to achieve only "approximate" load balance. The flow 
from server S 3 to the sink, which specifies the load on S 3 , is 
0.61, which is 0.14 less than its capacity. This load has been 
diverted to Sj as extra load as shown on the redundant arc 
between S a and the sink. Thus, the load on S a is now 0.34. 
The new redirection probabilities are shown in TABLE 4 
below. 
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40 As described above, the present invention utilizes a redi- 
rection mechanism for achieving load balance among a 
cluster of document servers. The redirection mechanism in 
the illustrative embodiments is an integral part of the HTTP 
protocol and is supported by all browsers and web servers. 

45 Alternative embodiments of the invention may utilize a 
redirection mechanism implemented at a higher level, such 
as redirection at the router level based on Internet Protocol 
(IP) addresses. In an HTTP redirection embodiment, if a 
client request received at a redirection server is to be 

50 redirected to another server, the original redirection server 
sends a redirection message to the client. The redirection 
message typically contains the URL or other identifier of the 
new server. The client then makes a separate request to the 
new server. The mapping which dictates that a URL should 

55 be redirected to another server may be located in the 
configuration file, which can be identified by a .conf suffix. 
Since a document may be replicated on more than one 
server, an incoming request for the document can be mapped 
to any of the servers on which the document exists. In the 

60 system 10 of FIG. 1, a request for a document is directed by 
the redirection server 14-1 or 14-2 to one of the document 
servers in server cluster 16 with a predetermined probability. 
This probability is determined by the document distribution 
algorithm described above. A configuration file htd.conf may 

65 specify the mapping of a URL to multiple URLs. For a given 
URL, each document server which is capable of serving the 
document associated with the URL, has a probability asso- 
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ciated with it. This probability can be specified along with the configuration file with the new redirection probabilities 

the document mapping in the configuration file. For has been generated. The signal handler in the redirection 

example, for a URL given by Aiserl the entry in htd.conf server reads the new configuration file, reinitializes its 

may be given by: internal data structures and starts processing requests with 

5 the updated redirection probabilities. 

Many conventional servers are equipped to gather infor- 

; mation about the accesses to the server. This information is 

Rcdi ^. c ' A J serl j/ . . . typically logged in a log file. For each access to a server, 

m] SSlSZS information like the IP address of the client, number of bytes 

[0.4] http://siv3.iuceiii.com/useri 10 sent Dack « time t0 process the request, response code and the 

— — — ^ — — time at which the request was made, can be logged into the 

.. , log file. The log file can be post-processed to determine the 

When a redirection server is initialized, it stores the above access rates t0 me valious doameaUi & me server . Since the 

information m table form in its internal data structures. above . described embodiments of the invention use a proba- 

When a request for a document in directory /userl is bmstic approach to distributing documents on the document 

received in the redirection server, the redirection server 15 f r ° 

performs a look-up into its internal table to check where the f ^ f^T* 1<Jad baknCmg * generally not per- 

request should be directed. In this example,- the document form , ed - j nstead ' av " a S e access r » te / or ™ch server is 

servers srvl, srv2 and srv3 possess the replicated directory ^ uab f d °™ y^st"** lon 8 Pf* 1 of time, which may 

Aiserl and the above entry in htd.conf specifies that the bc on the °^ ct ° f a fcw hou ^ to a the average access 

redirection server should redirect any request for /userl to 20 rates vary, , n order to keep the load balanced me reoUrecUon 

srvl and srv3 with a probability of 0.4 each, and to srv2 with Probabilities can be recomputed periodically and the tech- 

a probability of 0.2. Using these probabilities, the redirec- fq«« desenbed above used to communicate this change to 

donserverchoosesonedocumentserverandsendsaredirect *e redirection server and the document servers. The redi- 

message with the relevant URL to the client, which then ?f ° n P robab d" ies ™y be recomputed based on access 

connects directly to the document server using this URL. In 25 da a for a s P ecl fi c amount of time, such as toe previous hour, 

order to accommodate requests which come to the document «» *«*ss data may be gathered in conjunc- 

servers direcUy, e.g., all the relative URLs inside a document U ° n w * a D sl,dm 8 ™ do * based 0D f™. ^esslog 

server URL, the document servers may also have a configu- ^n™'™- For «• ra P le ; Dew ac f 55 probabilities could be 

ration file. However, the configuration file at the document everyl5 minutes based on logged data from the 

server may be slightly modified to prevent further redirec- 30 ?™™ hour. The sliding wmdow approach may be better 

c t u- u u * u -j u *u * able to smooth out dynamic access rate changes in certain 

tion of requests which have to be served by that server. . J 6 

Therefore, for directories and files which exist on a docu- * * \ \ . i ... 

_ . a „ „ i *„ • *u « *• «i A fault tolerant server system in accordance with the 

ment server, e.g., srvl, the entry in the configuration file ., .. 4 . ; 

~-tu~- * ~. • * • . * a- * • ^ invention provides replication of documents on a cluster of 

either does not exist or points to a directory in srvl. *, , , , . . . 

„, . .. . ; , . ^ -.c servers and guarantees load-balanced access to the docu- 

The document distribution process may be implemented * ments eyen in ^ cncc of S6rver failuKS Thc 

as follows. First, given the degree of replication, the access redirection . based load balanci of the t mvention 

rates and si^s for different documents m the server system, be implcmcntcd in systems other than those m the 

run the initial distribution algorithm to generate the mapping k embodiments described above. For example, the 

of documents to servers and the corresponding probability load balaQd of ^ inventioQ CQuld be Ued / 

numbers. Second, create a new configuration file for the 40 ■ ^u^u A^^T^^i . • fn T: A -. ; , 

„. ji^ . r™.. in which document servers are geographically distnbu ted, as 

redirectionserver and a so for each document server. Third, descfibed m for |} G^Lan andM. Seltzer, "An 

use the UNIX rdist function to update the documents and the Md k of G bical pleaching," HotOS '95, 1995. 

configuration files in the document servers. Finally, usine , tl _ u u j- * 

• . « ■ . . ■ . • j . , l e These and numerous other alternative embodiments withm 

the special primitive in rdist. restart the servers with the new a * t , , « . , . .„ . ... 

/ . n ,txt^ v j. * . ^ * me scope of the following claims will be readily apparent to 

configuration files. The UNIX rdist facility is software tool 4 5 ^ tfae an b 

for copying and synchronizing files remotely. It takes actions What is claimed is- 

specified in a file called distfile which is similar to a 

Makefile. For example, after the document distribution has 



~~ 1. A method of processing client requests received in a 
server system over a communication network, the method 
comprising the steps of: 

determining a distribution of a set of documents over a 
plurality of servers based at least in part on access rates 
of at least a subset of the documents; 
computing a set of redirection probabilities based on the 
distribution; 

routing a client request to a redirection server; and 
redirecting the client request from the redirection server to 
one of the plurality of document servers based on the 
set of redirection probabilities. 
60 2. The method of claim 1 wherein at least one of the 
When rdist is run on the redirection server with the above document servers is an HTTP server, 
distfile, it updates the directory /WWW contained on srvl, 3. The method of claim 1 wherein each of the document 
by copying the contents of the directory /WWW/srvl con- servers provides access to only a subset of a given set of 
tained locally on the redirection server. Ihe special primitive documents available in the server system, 
in rdist allows an action to be performed once a file with a 65 4. The method of claim 3 wherein the subset of documents 
given name has been copied. Therefore, this primitive can be for a given document server are available locally on that 
used to send a "hangup" signal to a redirection server once document server. 



been determined, the distfile given below may be used to 
update the document server srvl; 50 



WEBSERVERS - (srvl) 
WEBDIR - (AVWW/srvl) 

DEST o (/WWW) 55 
default: 

${WEBDIR} ->${WEBSERVERS} 
install -w -R {DEST} 

special /WWW/srv 1 /htd.conf "kUl -HUP 'cat/WWW/logs/httpd.pid'" 
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5. The method of claim 3 wherein the subset of documents 
for a given document server are available to that server over 
a local area network. 

6. The method of claim 3 wherein the redirection server 
redirects a request for a given document to one of the 5 
document servers only if a copy of the given document is 
available to that document server. 

7. The method of claim 1 wherein the documents are 
permitted to be replicated, such that more than one docu- 
ment server may be a candidate for servicing a given client ao 
request. 

8. The method of claim 1 further including the step of 
determining the redirection probabilities and an initial dis- 
tribution of the set of documents across the plurality of 
document servers using a load distribution algorithm. is 

9. The method of claim 8 wherein the load distribution 
algorithm ensures that at least a specified number k of 
replicas of each document are present in the system after the 
initial document distribution is complete. 

10. The method of claim 8 wherein the system includes N 20 
document servers and the load distribution algorithm 
ensures that no more than N-l redundant replicas of docu- 
ments are present in the system after the initial document 
distribution is complete. 

11. The method of claim 8 wherein the load distribution 25 
algorithm uses a load balancing metric to distribute the 
documents across the document servers such that request 
load is balanced across the document servers. 

12. The method of claim 11 wherein the load balancing 
metric is the access rates of the documents. 30 

13. The method of claim 12 wherein the load distribution 
algorithm attempts to equalize the sum of the access rates of 
all the documents accessible to a given document server 
across all of the document servers. 

14. The method of claim 11 wherein the load balancing 35 
metric is scaled access rates of the documents, wherein one 

of the scaled access rates is obtained for a given document 
by multiplying an access rate for that document by the size 
of that document. 

15. The method of claim 12 wherein if k is the rate at 40 
which client requests are received in the redirection server, 
the load distribution algorithm distributes the documents 
across N document servers such that the rate of requests 
redirected to each server is approximately X/N. 

16. The method of claim 8 wherein the load distribution 45 
algorithm includes the steps of: 

randomly selecting a document and one of the document 
servers; 

mapping an access rate of the selected document to the 
selected server; and 50 

if the selected server has additional capacity, randomly 
selecting another document and mapping it to the 
selected server. 

17. The method of claim 16 wherein the load distribution 
algorithm includes the steps of: 

mapping a portion of an access rate of a randomly- 
selected document to the selected server; and 

if the selected server has insufficient capacity to accom- 
modate the entire access rate of the randomly-selected 60 
document, selecting another document server at 
random, and mapping a remaining portion of the access 
rate of the randomly-selected document to the other 
randomly-selected document server. 

18. The method of claim 17 wherein the selecting and 65 
mapping steps are repeated until the entire access rates of all 

of the documents have been mapped to document servers. 



55 



19. The method of claim 8 wherein the load distribution 
algorithm is a two-phase algorithm including a first phase 
and a second phase, and the first phase includes the step of 
distributing documents to the document servers in accor- 
dance with a round robin technique, and further wherein the 
first phase ends when any server exceeds its access rate 
capacity. 

20. The method of claim 19 wherein the second phase of 
the two-phase algorithm distributes document copies to a 
document server until a cumulative maximum access rate is 
reached for that server, and then distributes document copies 
to another document server, and wherein replication of a 
given document copy takes place when a given document 
server exceeds its capacity when the given document copy 
is distributed to the given server. 

21. The method of claim 1 further including the step of 
recomputing the set of redirection probabilities after a 
change in conditions associated with the server system. 

22. The method of claim 21 wherein the change in 
conditions includes at least one of a failure of one or more 
of the document servers, a change in access rate for one or 
more documents, or a change in capacity of one of the 
document servers. 

23. The method of claim 22 wherein the recomputing step 
is based on a maximum-flow minimum-cost solution of a 
network flow problem. 

24. The method of claim 1 further including the step of 
utilizing a binning algorithm to redistribute documents from 
a failed one of the document servers to the remaining 
document servers. 

25. A server system for processing client requests 
received over a communication network, the system com- 
prising: 

a plurality of document servers, wherein a distribution of 
a set of documents over the plurality of document 
servers is determined based at least in part on access 
rates of at least a subset of the documents; and 

at least one redirection server for receiving a client 
request and for redirecting the client request to one of 
the plurality of document servers based on a set of 
redirection probabilities computed from the distribu- 
tion. 

26. The apparatus of claim 25 wherein at least one of the 
document servers is an HTTP server. 

27. The apparatus of claim 25 wherein each of the 
document servers provides access to only a subset of a given 
set of documents available in the server system. 

28. The apparatus of claim 27 wherein the subset of 
documents for a given document server are available locally 
on that document server. 

29. The apparatus of claim 27 wherein the subset of 
documents for a given document server are available to that 
server over a local area network. 

30. The apparatus of claim 27 wherein the redirection 
server redirects a request for a given document to one of the 
document servers only if a copy of the given document is 
available to that document server. 

31. The apparatus of claim 25 wherein the documents are 
permitted to be replicated, such that more than one docu- 
ment server may be a candidate for servicing a given client 
request. 

32. The apparatus of claim 25 wherein the redirection 
probabilities are determined and the set of documents are 
distributed across the plurality of document servers in accor- 
dance with a load distribution algorithm. 

33. The apparatus of claim 32 wherein the load distribu- 
tion algorithm ensures that at least a specified number k of 
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replicas of each document are present in the system after an 
initial document distribution is complete. 

34. The apparatus of claim 32 wherein the system 
includes N document servers and the load distribution 
algorithm ensures that no more than N-l redundant replicas 
of documents are present in the system after the initial 
document distribution is complete. 

35. The apparatus of claim 32 wherein the load distribu- 
tion algorithm uses a load balancing metric to distribute the 
documents across the document servers such that request 
load is balanced across the document servers. 

36. The apparatus of claim 35 wherein the load balancing 
metric is the access rates of the documents. 

37. The apparatus of claim 35 wherein the load balancing 
metric is scaled access rates of the documents, wherein one 
of the scaled access rates is obtained for a given document 
by multiplying an access rate for that document by the size 
of that document. 

38. The apparatus of claim 36 wherein the load distribu- 
tion algorithm attempts to equalize the sum of the access 
rates of all the documents accessible to a given document 
server across all of the document servers. 

39. The apparatus of claim 32 wherein if X is the rate at 
which client requests are received in the redirection server, 
the load distribution algorithm distributes the documents 
across N document servers such that the rate of requests 
redirected to each server is approximately X/N. 

40. The apparatus of claim 32 wherein the load distribu- 
tion algorithm is a two-phase algorithm including a first 
phase and a second phase, and the first phase distributes 
documents to the document servers in accordance with a 
round robin technique, and further wherein the first phase 
ends when any server exceeds its access rate capacity. 

41. The apparatus of claim 40 wherein the second phase 
of the two-phase algorithm distributes document copies to a 
document server until a cumulative maximum access rate is 
reached for that server, and then distributes document copies 
to another document server, and wherein replication of a 
given document copy takes place when a given document 
server exceeds its capacity when the given document copy 
is distributed to the given server. 

42. The apparatus of claim 25 wherein the redirection 
probabilities are recomputed after a change in conditions 
associated with the server system. 

43. The apparatus of claim 42 wherein the change in 
conditions includes at least one of a failure of one or more 
of the document servers, a change in access rate for one or 
more documents, or a change in capacity of one of the 
document servers. 

44. An apparatus for processing client requests received in 
a server system over a communication network, the appa- 
ratus comprising: 

means for determining a distribution of a set of documents 
over a plurality of servers based at least in part on 
access rates of at least a subset of the documents; 

means for computing a set of redirection probabilities 
based on the distribution; 

means for routing a client request to a redirection server; 
and 

means for redirecting the client request from the redirec- 
tion server to one of the plurality of document servers 
based on the set of redirection probabilities. 

45. The apparatus of claim 44 wherein each of the 
document servers provides access to only a subset of a given 
set of documents available in the server system. 

46. The apparatus of claim 44 wherein the redirection 
probabilities and an initial distribution of the set of docu- 
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ments across the plurality of document servers are deter- 
mined using a load distribution algorithm. 

47. The apparatus of claim 46 wherein the load distribu- 
tion algorithm ensures that at least a specified number k of 

5 replicas of each document are present in the system after the 
initial document distribution is complete. 

48. The apparatus of claim 46 wherein the system 
includes N document servers and the load distribution 
algorithm ensures that no more than N-l redundant replicas 

1Q of documents are present in the system after the initial 
document distribution is complete. 

49. The apparatus of claim 46 wherein the load distribu- 
tion algorithm uses a load balancing metric to distribute the 
documents across the document servers such that request 
load is balanced across the document servers. 

15 50. The apparatus of claim 49 wherein the load balancing 
metric is the access rates of the documents. 

51. The apparatus of claim 49 wherein the load balancing 
metric is scaled access rates of the documents, wherein one 
of the scaled access rates is obtained for a given document 

20 by multiplying an access rate for that document by the size 
of that document. 

52. The apparatus of claim 44 wherein the set of redirec- 
tion probabilities arc recomputed after a change in condi- 
tions associated with the server system. 

25 53. The apparatus of claim 52 wherein the change in 
conditions includes at least one of a failure of one or more 
of the document servers, a change in access rate for one or 
more documents, or a change in capacity of one of the 
document servers. 
30 54. An apparatus for processing client requests received in 
a server system over a communication network, the appa- 
ratus comprising: 

a memory for storing a set of redirection probabilities; and 
a processor operative to determine a distribution of a set 
35 of documents over a plurality of servers based at least 
in part on access rates of at least a subset of the 
documents; to compute the set of redirection probabili- 
ties based on the distribution; to receive a client 
request; and to redirect the client request to one of the 
40 plurality of document servers based on the set of 
redirection probabilities. 

55. A method of processing client requests received in a 
server system over a communication network, the method 
comprising the steps of: 

45 determining a distribution of a set of documents over a 
plurality of servers of the server system based at least 
in part on access rates of at least a subset of the 
documents; 

computing a set of redirection probabilities based on the 
50 distribution; 

storing the set of redirection probabilities in a memory of 

the system; 
receiving a client request; and 
55 routing the client request to one of the plurality of 
document servers based on the set of redirection prob- 
abilities. 

56. A method of processing client requests received in a 
server system over a communication network, the method 

60 comprising the steps of: 

determining a distribution of a set of documents over a 
plurality of document servers based at least in part on 
access rates of at least a subset of the documents; 
computing a set of redirection probabilities based on the 
65 distribution; 

utilizing the distribution to compute a set of redirection 
probabilities; and 
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routing a client request to one of the plurality of document 

servers based on the set of redirection probabilities. 
57. A method of processing client requests received in a 
server system over a communication network, the method 
comprising the steps of: 5 
determining a distribution of a set of documents over a 
plurality of document servers of the server system 
based at least in part on access rates of at least a subset 
of the documents; 



22 

computing a set of redirection probabilities based on the 
distribution; 

routing a client request from a redirection server of the 
server system to one of the plurality of document 
servers in the system based on the set of redirection 
probabilities; and 

recomputing the redirection probabilities after a change in 
conditions associated with the server system. 

* * * * * 



02/13/2004, EAST version: 1.4.1 



